Collecting humorous expressions from a community-based question-answering-service corpus
نویسندگان
چکیده
We proposed a method of collecting humorous expressions from an online community-based question-answering (CQA) corpus where some users post a variety of questions and other users post relevant answers. Although the service is created for the purpose of knowledge exchange, there are users who enjoy posting humorous responses. Therefore, the corpus contains many interesting humour communication examples that might be useful in understanding the nature of online communications and variations in humour. Considering the size of 3, 116, 009 topics, it is necessary to introduce automation in the collection process. However, due to the context dependency of humour expressions, it is hard to collect them automatically by using keywords or key phrases. Our method uses natural language processing based on dissimilarity criteria between answer texts. By using this method, we can collect humour expressions more efficiently than by manual exploration: 30 times more examples per hour.
منابع مشابه
Human Judgment on Humor Expressions in a Community-Based Question-Answering Service
For understanding humorous dialogue, a collection of humorous expressions is needed. In addition to humorous expressions, their annotations are important to be used as language resources. In this paper, we analyzed how human assessors annotate humorous expressions extracted from an online community-based questionanswering (CQA) corpus, which contains many interesting examples of humorous commun...
متن کاملارایه یک پیکره پرسش و پاسخ مذهبی در زبان فارسی
Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...
متن کاملCorpus-based Question Answering for why-Questions
This paper proposes a corpus-based approach for answering why-questions. Conventional systems use hand-crafted patterns to extract and evaluate answer candidates. However, such hand-crafted patterns are likely to have low coverage of causal expressions, and it is also difficult to assign suitable weights to the patterns by hand. In our approach, causal expressions are automatically collected fr...
متن کاملWhich iR Model has a BetteR sense of huMoR? seaRch oveR a laRge collection of Jokes
This paper describes experiments on humorous response generation for short text conversations. Firstly, we compiled a collection of 63,000 jokes from online social networks (VK and Twitter). Secondly, we implemented several context-aware joke retrieval models: BM25 as a baseline, query term reweighting, word2vec-based model, and learning-to-rank approach with multiple features. Finally, we eval...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کامل